PALMA: Perfect Alignments using Large Margin Algorithms

نویسندگان

  • Gunnar Rätsch
  • Bettina Hepp
  • Uta Schulze
  • Cheng Soon Ong
چکیده

Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. We present a novel approach based on large margin learning that combines kernel based splice site predictions with common sequence alignment techniques. By solving a convex optimization problem, our algorithm – called PALMA – tunes the parameters of the model such that the true alignment scores higher than all other alignments. In an experimental study on the alignments of mRNAs containing artificially generated micro-exons, we show that our algorithm drastically outperforms all other methods: It perfectly aligns all 4358 sequences on an hold-out set, while the best other method misaligns at least 90 of them. Moreover, our algorithm is very robust against noise in the query sequence: when deleting, inserting, or mutating up to 50% of the query sequence, it still aligns 95% of all sequences correctly, while other methods achieve less than 36% accuracy. For datasets, additional results and a stand-alone alignment tool see http://www.fml.mpg.de/raetsch/projects/palma.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PALMA: mRNA to genome alignments using large margin algorithms

MOTIVATION Despite many years of research on how to properly align sequences in the presence of sequencing errors, alternative splicing and micro-exons, the correct alignment of mRNA sequences to genomic DNA is still a challenging task. RESULTS We present a novel approach based on large margin learning that combines accurate splice site predictions with common sequence alignment techniques. B...

متن کامل

Liquid-liquid equilibrium data prediction using large margin nearest neighbor

Guanidine hydrochloride has been widely used in the initial recovery steps of active protein from the inclusion bodies in aqueous two-phase system (ATPS). The knowledge of the guanidine hydrochloride effects on the liquid-liquid equilibrium (LLE) phase diagram behavior is still inadequate and no comprehensive theory exists for the prediction of the experimental trends. Therefore the effect the ...

متن کامل

Phoneme Alignment using Large Margin Techniques

We propose an alignment method which is based on recent advances in kernel machines and large margin classifiers for sequences [13, 12], which in turn build on the pioneering work of Vapnik and colleagues [15, 4]. The alignment function we devise is based on mapping the speech signal and its phoneme representation along with the target alignment into an abstract vector-space. Building on techni...

متن کامل

A Fast Algorithm for the Computation and Enumeration of Perfect Phylogenies

The perfect phylogeny problem is a classical problem in computational evolutionary biology, in which a set of species/taxa is described by a set of qualitative characters. In recent years, the problem has been shown to be NP-complete in general, while the different fixed parameter versions can each be solved in polynomial time. In particular, Agarwala and Fernández-Baca have developed an O(2(nk...

متن کامل

Homomesy of Alignments in Perfect Matchings

We investigate the existence of a group action τ that is homomesic with respect to alignments, a type of statistic in perfect matchings. Homomesy is defined as the consistency of an average, and perfect matchings are defined as the set of all partitions of 1 to 2n into pairs. We take advantage of the bijection between labeled Dyck paths and perfect matchings to investigate to investigate the po...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006